A Better Decision Tree: The Max-Cut Decision Tree with Modified PCA Improves Accuracy and Running Time

نویسندگان

چکیده

Abstract Decision trees are a widely used method for classification, both alone and as the building blocks of multiple different ensemble learning methods. The Max Cut decision tree introduced here involves novel modifications to standard, baseline variant classification tree, CART Gini. One modification an alternative splitting metric, Cut, based on maximizing distance between all pairs observations that belong separate classes sides threshold value. other modification, Node Means PCA, selects feature from linear combination input features constructed using adjustment principal component analysis (PCA) locally at each node. Our experiments show this node-based, localized PCA with metric can dramatically improve accuracy while also significantly decreasing computational time compared Gini tree. These improvements most significant higher-dimensional datasets. For example dataset CIFAR-100, enabled 49% improvement in accuracy, relative Gini, providing $$6.8 \times$$ 6.8 × speed up Scikit-Learn implementation expected advance capabilities difficult tasks.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Decision Tree with Better Ranking

AUC (Area Under the Curve) of ROC (Receiver Operating Characteristics) has been recently used as a measure for ranking performance of learning algorithms. In this paper, we present a novel probability estimation algorithm that improves the AUC value of decision trees. Instead of estimating the probability at the single leaf where the example falls into, our method averages probability estimates...

متن کامل

An Algorithm for Better Decision Tree

The present paper aims at constructing the decision tree for a given database which adopts an improved ID3 decision tree algorithm to implement data mining in order to predict the output. The database is generated using the sampling techniques and the classification algorithm is applied on the samples. The obtained results are compared with experimental results in order to verify the validity a...

متن کامل

Predicting Twist Condition by Bayesian Classification and Decision Tree Techniques

Railway infrastructures are among the most important national assets of countries. Most of the annual budget of infrastructure managers are spent on repairing, improving and maintaining railways. The best repair method should consider all economic and technical aspects of the problem. In recent years, data analysis of maintenance records has contributed significantly for minimizing the costs. B...

متن کامل

P155: Differential Diagnosis of Panic Attacks: Using a Decision Tree

Panic attacks are discrete episodes of intense fear or discomfort accompanied by symptoms such as palpitations, shortness of breath, sweating, trembling, derealization and a fear of losing control or dying. Although panic attacks are required for a diagnosis of panic disorder, they also occur in association with a host of other disorders listed in the 5h version of the diagnostic and statistica...

متن کامل

A Decision Tree for Technology Selection of Nitrogen Production Plants

Nitrogen is produced mainly from its most abundant source, the air, using three processes: membrane, pressure swing adsorption (PSA) and cryogenic. The most common method for evaluating a process is using the selection diagrams based on feasibility studies. Since the selection diagrams are presented by different companies, they are biased, and provide unsimilar and even controversial results. I...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: SN computer science

سال: 2022

ISSN: ['2661-8907', '2662-995X']

DOI: https://doi.org/10.1007/s42979-022-01147-4